body {
font-family: 'Roboto', sans-serif;
color: #555; /* Darkened text color */
background-color: #D6EAF8 ; /* Lightened background color */
}
h1 {
color: #2E86C1;
font-size: 28px;
}
p {
font-size: 18px; /* Increased paragraph font size */
}
I am looking for the picture shows a lovely dog, and I want to explore the data behind the picture to see the relationship between view,likes, and downloads, so I use “lovely dog” as my key words to get the photos on the website. After I get the results, I found that the photo with the highest views(1,813,693) and the highest download(1,249,271) shows the relationship between man, woman and the dog. And most of the photos are showing the relationship between people and dog, also there are two photos are cartoon picture. The most common words used in tag are “pet” and “dog”. The main colour in these pictures are green as many photos shows a background of grass or forest.
The screenshot of the first few rows of the Royalty free photos is:
The HTML table is:
photo_data %>%
select(pageURL) %>%
knitr::kable()
The gif of my photos is:
In conclusion, there is 51 in my data, and there is 7 in my data.
The max value of download is 1248458
The mean number of the likes is 221.
The median value of views is 43530
photo_data <- photo_data %>%
mutate(logic_like = ifelse(likes>100,
">= 100",
"< 100"))
# delete the outlier
photo_data <- photo_data %>%
filter(downloads < max(photo_data$downloads))
#create the plot to see the relationship between views and downloads based on the level of like
plot <- ggplot(photo_data,aes(x = views,
y = downloads,
colour = logic_like)) +
geom_point(pch = 10) +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(vars(logic_like),scale = "free")+
labs(title = "Downloads Vs Views",
x = "Views",
y = "Dowloads")
plot
As I want to see the relationship between views, downloads and likes,I need to group the data based on the level of likes, I find there is a outlier which will influenced my data, I delete this data. Then I draw two plots in two different level of likes by using ggplot, and I add a line to show the trend of these data.
After I learning Module 3: Creating new variables and data tables, Iknow that how to explore data by reading JSON, and I learn more functions by install package “FromJson”, like pull(), filter(),mutate(), group_by(). The most thing I am interested in is mutate() and filer(), by using these two function, I can make the data more clear and easily to see the result by set the different level of the data. By using these functions, and combined with other two modules before, I can explore more data in different ways and type.
The thing that I wanna to know more is graph databases adn analytics, Graph databases are great for representing complex relationships and networks. I’m curious about their applications in various fields, such as social networks, recommendation systems, and fraud detection.
library(tidyverse)
library(jsonlite)
library(magick)
library(ggplot2)
json_data <- fromJSON("pixabay_data.json")
pixabay_photo_data <- json_data$hits
#select around 50 photos
#select the variable that interest in
data <- pixabay_photo_data %>%
select(downloads,likes,tags,views, pageURL, previewURL)
selected_photos <- data %>%
mutate(dog_tags = ifelse(str_detect(str_to_lower(tags),"dog|pet"),
"yes",
"no"))
#get around 50 datas
download_data <- selected_photos$downloads %>%
sort(decreasing = TRUE)
download_data[50]
selected_photos <- selected_photos %>%
filter(downloads >13000)
write_csv(selected_photos, "selected_photos.csv")
#Part C
#summary
rows <- nrow(selected_photos)
cols <- ncol(selected_photos)
max_value <- max(selected_photos$downloads)
mean_likes <- selected_photos$likes %>% mean(na.rm = TRUE)
glimpse(selected_photos)
median_views <- median(selected_photos$views)
tag_group <- selected_photos %>%
group_by(selected_photos$dog_tags) %>%
summarise(mean_downloads = mean(downloads))
#gif
image <- image_read(selected_photos$previewURL) %>%
image_scale(500) %>%
image_animate(fps = 1)
image_write(image, "my_photos.gif")
#creativity
selected_photos <- selected_photos %>%
mutate(logic_like = ifelse(likes>100,
">= 100",
"< 100"))
selected_photos <- selected_photos %>%
filter(downloads < max(selected_photos$downloads))
plot <- ggplot(selected_photos,aes(x = views,
y = downloads,
colour = logic_like)) +
geom_point(pch = 10) +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(vars(logic_like),scale = "free")+
labs(title = "Downloads Vs Views",
x = "Views",
y = "Dowloads")